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ABSTRACT 


Lifespan of a normal human is increasing with the world population and it 
produces new challenge in health care, big data change the method of data 
management leverage data and analyzing data, with the help of big data we can 
reduces the costs of treatment, reducing medication and provide better 
treatment with predictive analytics. Health related data collected from various 
sources like electronic health record [EHR),medical imaging system, genomic 
sequencing, pay of records, pharmaceutical research, and medical devices, etc. 
are refers to as big data in healthcare. 
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INTRODUCTION 

The swift development of the upcoming information 
technologies, experimental technologies and methods, cloud 
computing, the Internet of Things, social networks supplies 
the amounts of generated data that is rising immensely in 
many research fields. Big data is starting to revolutionize the 
health care industry. The change in treatment technology 
pharmaceutical technology, reducing medication ,and 
helping to provide better treatment. 

Data is a powerful resource which is found in many forms. 
Big data do not have a universal definition while it is 
discussed in different ways. The term Big data is referred to 
describe the exponential growth of the data flow in various 
sectors which is too large to process using the available 
traditional database and software techniques. Often big data 
is presumed to be scary, yet it is an explosion in the field of 
information. It helps to perform various analytics, which can 
make an impact on the economic growth, creating 
opportunities, improving efficiency over other organizations. 
The term big data is described by the following 
characteristics: value, volume, velocity, variety veracity and 
variability, denoted as 6 “Vs” , shown in Figure 1. 

Volume: Data volume is a contribution by various factors. It 
can be transactional data, which is being used through the 
years, or the data flow over the social media. The volume of 
the data is the total quantities of the mass data within an 
organization. The volume of data generated in an 
organization increases daily at an unpredictable rate, which 
can be in petabytes and zeta bytes on the production 
activities and the type of the organization. 


Velocity: This refers to the data in the total data transmitted 
currently in an organization or in motion. The speed of the 
data that an organization produce process and analyzes 
normally keep on accelerating. It influences the creation and 
delivery of the data from one point to the next. It is often 
time-sensitive. 

Variety: The variety, which is diverse in forms, type of data 
and its origin. It defines the complexity of the data, and the 
Occurrences of data. It is in any form like structured, 
semistructured and unstructured data. Some forms of 
structured data are the Numerical data, traditional 
databases, business information and unstructured data like 
Audio, Video and Pictures. 

Veracity: Veracity, which is composed of the data that the 
organization is uncertain. It analyzes levels of forms of data 
credited on reliability. Organizations enactment of strategies 
to ensure quality and reliable data is normally hindered by 
factors such as weather and customer's reactions and 
purchasing decisions. 

Variability: Variability refers to data fluctuations 
throughout the handling and lifecycle. Developing range and 
variability also grows the attraction of data and the 
possibility in providing valuable information, unforeseen, 
and hidden [20], Value is the method of extracting valuable 
information from huge sets of data and it is usually referred 
to as big data analytics . Data value is useful for proper 
making decisions. 
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Value: value of the big data refers to their coherent analysis, 
which should be valua ble to the patie nts and clinicians. 
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Fig: 1 

BIG DATA IN HEALTHCARE 

Defining the volume of the data, the type of data, and the 
entity limitations are very wide. Combining the complete 
health care data is very large, which refers as —big data|| but 
not as huge, as we think that the organization cannot handle 
its data. Most of the healthcare providers have not faced any 
harder situation to handle the data for them; however, it is 
always good to foresee the technology improvements and 
implementation, which can help them. According to the 
McKinsey Global Institute, better targeting of preventative 
healthcare messages to the right population at the right time 
could save $70-100 billion 0 . As a result, Hadoop data 
processing is the one of the best choice to go with at the 
current trends. The computational capabilities of Hadoop 
processing will be able to stimulate the mathematical 
methods available currently, medical research approaches to 
increase the outcome quality. Although big data might not be 
their case, but the discovery process to find new techniques 
to analyze the data they have, increase the accuracy of the 
experimental results, provide various mechanisms to find 
the quality of data is always at the highest priority. As a 
result, Hadoop data processing is the one of the best choice 
to go with the current trends. The computational capabilities 
of Hadoop processing will be able to stimulate the 
mathematical methods available currently, and medical 
research approaches to increase the outcome quality. —Most 
of the data systems are for billing, and they aren't used to 
improve the quality of care, || explains Jason Jones, executive 
director for clinical intelligence and decision support at 
Kaiser Permanente, a health care provider and notfor-profit 
health plan that serves approximately 9.1 million members 
in 8 states and the District of Columbia 0 . The emerging 
generic health care systems usually save and manage EMR 
(Electronic Medical Record], PHR (Personal Healthcare 
Record] 0, Laboratory Information System (LIS], biomedical 
data, biometrics data, and genomic data 0 which can be the 
invaluable sources to generate the outcomes. These various 
data sources help to process and analyze the data with 
various characteristics. Processing such massive data sets 
using the Hadoop technology will helps us not only to 
process quickly compared to the traditional database 
solutions which are being used currently, but also provides 
an extra edge to analyze the data characteristically. 

Applications of Big Data in Healthcare: 

Big data can be applied in almost all the areas of healthcare 
management. The potential application areas are fraud 
detection, epidemic spread prediction, Omics, clinical 
outcome, medical device design, insurance industry, 
personalized patient care and manufacturing, and 


pharmaceutical development etc. Moreover the application 
of big data is widely adopted in personalized healthcare 
which offers an individual centric approach . 

Big Data in 'Omics': 

"Omics” data refer to significant datasets in the organic and 
molecular fields (e.g., proteomics, metabolomics 
macrobiotics, genomics etc.]. Application of big data on this 
study is to realize the strategies of diseases and increase the 
specification of medical treatments (e.g. "precision 
medicine"]. With the advance in metabolomics, proteomics, 
genomics, and other types of omics know-hows through the 
previous eras, a remarkable volume of data associated to 
molecular biology has been formed. Genomics is the study of 
genes and their functions. Application of big data in 
genomics will help to prevent or cure diseases and delivering 
personalized care to each patient. This area is in still 
emerging period with presentations in particular 
concentrated regions, for example leukemia, diabetes, and 
cancer. Pathway analysis is mostly used for high-quantity of 
genome-scale data , there are three generations of same 
structures used in pathway analysis. The first generation 
tools are Clue Go, Onto-Express and GoMiner . The most 
popular tool for second generation is GSEA, and the example 
for the third generation tool is Pathway-Express. Proteomics 
is the study of proteome on their structures and functions. A 
proteome is the entire set of proteins in a cell. ExPASy 
(http://www.expasy.org/proteomics] lists dozens of 
databases on proteomics and over 100 tools. Big data 
application in proteomics, will have a major role in 
predicting and preventing human cancer. Find Mod and CSS- 
Palm are frequently used for PTMs prediction. Metabolomics 
is the systematic concept of chemical procedures including 
metabolites. The database BiGG used Genomic-based 
reconstruction of human metabolism for systems biology. 

Insurance Industry / Payer: 

Healthcare Insurance companies/ payers are using big data 
in underwriting, fraud deduction, and claim management. 
Insurance providers are observing further than algorithmic 
fraud revealing practices that are claim-centric, to ones that 
are person-centric. For example how many related claims 
were been submitted by the same personality or stated the 
identical treatment in different insurance companies. 

Medical Device Design and Manufacturing: 

Big Data implement facilitates a wider set of device 
materials, delivery methods, and tissue interactions, 
anatomical configurations to be evaluated. Calculation 
techniques and Big Data can plays a significant role in 
medical system strategy and manufacturing. 

Pharmaceuticals: 

Big data is used during all phases of pharmaceutical 
development, particularly for drug discovery. Pfizer has 
recently initiated Precision Medicine Analytics Environment 
program that associates the dots among electronic medical 
record data, clinical trial, and genomic to identify chances to 
rapidly convey innovative medicines for particular patient 
populations. 

Personalized Patient Care Healthcare: 

Big Data will make possible to bring best and modified 
patient care. In nearby future, fresh big data-derived 
influences will prompt suitable updates of diagnostic 
assistance, clinical guidelines and patient triage to permit 


velocity 
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more particular and modified treatment to advance medical 
result for patients 

Privacy and security 

Two important issues towards big data in healthcare and 
medicine are security and privacyofthe individuals/patients 
. All medical data are very sensitive and different countries 
consider these data as legally possessed by the patients. To 
address these security and privacy challenges, the big data 
analytics software solutions should use advanced encryption 
algorithms and pseudo-anonymization of the personal data. 
These software solutions should provide security on the 
network level and authentication for all involved users, 
guarantee privacy and security, as well as set up good 
governance standards and practices. 

Conclusion and future work 

Big data analytics in medicine and healthcare is very 
promising process of integrating, exploring and analysing of 
large amount complex heterogeneous data with different 
nature: biomedical data, experimental data, electronic health 
records data and social media data. Integration of such 
diverse data makes big data analytics to intertwine several 
fields, such as bioinformatics, medical imaging, sensor 
informatics, medical informatics, health informatics and 
computational biomedicine. As a further work, the big data 
characteristics provide very appropriate basis to use 
promising software platforms for development of 
applications that can handle big data in medicine and 
healthcare. One such platform is the open-source distributed 


data processing platform Apache Hadoop MapReduce that 
use massive parallel processing (MPP). These applications 
should enable applying data mining techniques to these 
heterogeneous and complex data to reveal hidden patterns 
and novel knowledge from the data. Recent hardware 
innovations in processor technology, newer kinds of 
memories/network architecture will minimize the time 
spent in moving the data from storage to the processor in a 
distributed setting. 
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